How to use Stata efficiently

Programming is a precise language. Unlike natural language, syntax or spelling errors will not be tolerated by the computer. For this reason, its important to have good habits for programming so that you 1) avoid mistakes and 2) identify them easier.

1. How to start your dofile

Always begin you dofile in the same way (just copy and paste from old dofiles): 1) some basic information, 2) packages necessary to run the code, 3) preliminary settings, and 4) import the file structure. Example:


/*******************************************************************************
********************************************************************************

                     Lab 1 for Econometrics class

                         Written by Eirik Berger
                           PhD Research Scholar
                  The Norwegian School of Economics (NHH)

********************************************************************************    
*******************************************************************************/

* Extra packages used (uncheck if not already installed)
* ssc install estout

* Preliminary settings
clear all
set matsize 1600
set scheme s1color
set more off

********************************************************************************    
********************************************************************************    

* Top folder for the project
global top "/Users/eirikberger/Dropbox/0_AKADEMISK/department_work/ECN402_spring_2020/lab0"

* Other globals: Based on global wd
global data "$top/data"
global dofiles  "$top/dofiles" 
global figures  "$top/figures" 
global tables  "$top/tables"

cd "$top"

/*******************************************************************************    
********************************************************************************
********************************************************************************    
********************************************************************************    

Question 1
Part a

********************************************************************************    
********************************************************************************
********************************************************************************    
*******************************************************************************/

2. Load and check your data

Always add “clear” as an option when you open a .dta file. Also, don’t include the full path: It is cleaner to either use globals (see below) or use the “cd” (change directory) command.

use "$data/BWGHT.DTA", clear
list faminc cigtax cigprice in 1/10, noobs 
  | faminc   cigtax   cigprice |
  |----------------------------|
  |   13.5     16.5      122.3 |
  |    7.5     16.5      122.3 |
  |     .5     16.5      122.3 |
  |   15.5     16.5      122.3 |
  |   27.5     16.5      122.3 |
  |----------------------------|
  |    7.5     16.5      122.3 |
  |     65     16.5      122.3 |
  |   27.5     16.5      122.3 |
  |   27.5     16.5      122.3 |
  |   37.5     16.5      122.3 |
  +----------------------------+
sum faminc
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      faminc |      1,388    29.02666    18.73928         .5         65

Or simply use the “browse” command (“br” for short) to look at your dataset directly.

3. How to re-use results (using locals)

Sometimes we want to save certain numbers coming from Stata commands for later use. Example: What is the share of white mothers who smoke? First, we find the number of white mothers.

count if white==1
  1,089

Then the number of white mothers who smoke.

count if white==1 & cigs>0
  165

I copy the answers from the above commands and calculate the answer using the “display” command. Note that the display command functions as a calculator when you don’t use “”.

display 165/1089
.15151515

However, we can do this without any manual copy-and-paste work. Using the “return list” command after running the “count” command, we find that the count command returns (saves) the scalar “r(N)”.

count if white==1
  1,089
return list
scalars:
                  r(N) =  1089

Similarly for the “sum” command:

sum cigs
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        cigs |      1,388    2.087176    5.972688          0         50
return list
scalars:
                  r(N) =  1388
              r(sum_w) =  1388
               r(mean) =  2.087175792507205
                r(Var) =  35.67300052567169
                 r(sd) =  5.972687881152981
                r(min) =  0
                r(max) =  50
                r(sum) =  2897

We use the “local” command to save the mean (r(mean)) as “groupmean”. You can check the content of “groupmean” by using the display command.

local groupmean = r(mean)
display "`groupmean'"
2.087175792507205

Put it all together and we can estimate the share of smokers among white mother in one go:

count if white==1
local white = r(N)
display "Saved number is: " `white'
count if white==1 & cigs>0
local whiteandsmoke = r(N)
display "Saved number is :"`whiteandsmoke'
display "The fraction of white people that smoke is: " `whiteandsmoke'/`white'
display "or in percent " 100*(`whiteandsmoke'/`white') "%"
  1,089


Saved number is: 1089

  165


Saved number is :165

The fraction of white people that smoke is: .15151515

or in percent 15.151515%

Important: When using locals you must either 1) run the entire do-file in one go or 2) run the creation of the local (like in “local whiteandsmoke = r(N)”) together with the line of code where you use the number saved in the local (like “display `white’”“). You do the second option by selecting all lines of code from the”local … " to the row where you use the number. This is because locals are short lived, and deletes itself when stata is done running a bunch of code.

4. Using global to deal with folder structures

Globals are a great way of keeping your dofile clear and the file structure organized. With globals you save a string (text) to a name, like at the beginning of my dofile. Use $[name] to retrieve it. In the following example, I save the path to my working directory and then change the directory:

global wd "/Users/eirikberger/Dropbox/0_AKADEMISK/department_work/ECN402_spring_2020/lab0"
cd "$wd"

Similarly, to open a dataset in the data folder (already saved as a global), you write:

use "$data/BWGHT.DTA", clear

5. Comment a lot!

Use “*" to write comments in your dofile. This symbol can also be used to “turn off” a command by placing “*" in front of the command to save it for later. Use “//” if you want to comment on the same line as a command (see example below). You can also create a full section with text only (parts that are not interpreted as code), by writing “/*" to start the section and “*/" to end it (see point no. 1 for an example).

It’s challenging to interpret code for future you or co-authors. Note: You have to hand in your dofile for assignments (as an appendix in your main pdf file). The dofile should 1) reproduce your results, 2) it should be clear what you have done (use comments) and 3) it should look tidy and nice.

6. Logs

You have to hand in the log from running your full dofile (as an appendix in your main pdf file). It might be easiest to add the commands below when you create your dofile, but to make them inactive (use “*" in front of them) until you are done and run the full dofile in one go.

capture log close           // Could add this to your preliminaries. Closes the log file given that a log files is recording
log using introlab, text replace            // Opens a log and saves it as a text file.
capture log close          // close and saves the log file given that a log file is recording
exit         // Useful closing commands

Note that you might find the “capture” command useful at some point, especially when your dofile has to be able to deal with several different datasets. It executes the command following it (same line), given that there are no errors. If there are any errors in the following command, it stops the process and continues with the next line. Examples, where this might be useful, is to drop a certain variable given that it exists. For example, if you might have a bwght variable and an income variable, but it is not certain:

drop bwght

This command works because there is indeed a variable named bwght. What about income:

drop income

We get an error because there is no such variable. To run the code without any error, we run:

capture drop income

7. Use Google and the “help” command

Just do it. Ask for help if you can’t find it there.

8. Delimit

In cases where one command goes across several lines (for example when you are producing figures), you can use #delimit ; and #delimit cr. Example

#delimit ;
esttab lin_all lin_high using A1_tab1.tex, 
  b(%4.2f) se(%4.2f) r2(%4.2f) scalars(Prediction) replace label star(* 0.10 ** 0.05 *** 0.01) 
  mtitles("All countries" "High-income countries") 
  title("Linear regression model: Outcome is labour productivity in agriculture");
#delimit cr

Remember to add the “;” at the end of the last line in the command.

9. Logical statements

Remember the logical statments:

  • & and
  • | or
  • != not equal to
  • == equal to
  • > larger than
  • < smaller than
  • = larger or equal to

  • <= smaller than or equal to

From last lecture

Vincent asked me to go through how you run an f-test in Stata. I think this is an excellent opportunity to learn how to figure out these kinds of questions by yourself.

1. T-test

If you search for “t-test” on google, you will find this page quite soon (the stata help manual for the command “ttest”): https://www.stata.com/manuals13/rttest.pdf. It gives you all the information you need about the syntax of the command + good examples of how to use the command.

2. F-test

How should we learn to use the f-test? You guessed it: Just use the same trick as for the t-test and find this help page (the stata help manual for the command “test”): https://www.stata.com/manuals13/rtest.pdf

reg faminc cigtax cigprice bwght male white
      Source |       SS           df       MS      Number of obs   =     1,388
-------------+----------------------------------   F(5, 1382)      =     40.07
       Model |  61667.9131         5  12333.5826   Prob > F        =    0.0000
    Residual |  425392.101     1,382  307.809045   R-squared       =    0.1266
-------------+----------------------------------   Adj R-squared   =    0.1235
       Total |  487060.014     1,387  351.160789   Root MSE        =    17.544

------------------------------------------------------------------------------
      faminc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cigtax |  -.7546118   .1254443    -6.02   0.000    -1.000694     -.50853
    cigprice |   .6144594   .0953262     6.45   0.000     .4274597    .8014591
       bwght |   .0674765   .0234134     2.88   0.004     .0215469     .113406
        male |  -1.750652   .9463898    -1.85   0.065    -3.607168    .1058639
       white |   13.45813   1.163566    11.57   0.000     11.17558    15.74067
       _cons |   -54.0982   10.69192    -5.06   0.000    -75.07234   -33.12405
------------------------------------------------------------------------------

To use the f-test to test the joint null hypothesis that the coefficients on both “cigtax” and “cigprice” are equal to zero, we use the “test” command. It’s really intuitive:

test (cigtax = 0) (cigprice = 0)
 ( 1)  cigtax = 0
 ( 2)  cigprice = 0

       F(  2,  1382) =   21.08
            Prob > F =    0.0000

Or with even less code:

test cigtax cigprice
 ( 1)  cigtax = 0
 ( 2)  cigprice = 0

       F(  2,  1382) =   21.08
            Prob > F =    0.0000

What about testing if they are equal to each other?

test cigtax = cigprice
 ( 1)  cigtax - cigprice = 0

       F(  1,  1382) =   40.99
            Prob > F =    0.0000

Assignment 1

You should always attempt to fully automate the production of your output, as this will save you a lot of time if you have to do lots of revisions. Note: With LaTex (Word for academia), you can go even further in making your workflow efficient. You might be okay with less efficiency in this course, but it could save you days of work when writing an empirical master and PhD thesis.

1. Produce and export regression results

Based on the hint dofile of assignment 1:

* Install estout (place "*" in front of ssc... once installed)
ssc install estout

Run 2 regression and save the estimated coefficients.

reg bwght cigs
estimates store reg1
      Source |       SS           df       MS      Number of obs   =     1,388
-------------+----------------------------------   F(1, 1386)      =     32.24
       Model |  13060.4194         1  13060.4194   Prob > F        =    0.0000
    Residual |    561551.3     1,386  405.159668   R-squared       =    0.0227
-------------+----------------------------------   Adj R-squared   =    0.0220
       Total |   574611.72     1,387  414.283864   Root MSE        =    20.129

------------------------------------------------------------------------------
       bwght |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cigs |  -.5137721   .0904909    -5.68   0.000    -.6912861   -.3362581
       _cons |   119.7719   .5723407   209.27   0.000     118.6492    120.8946
------------------------------------------------------------------------------
reg bwght cigs white
estimates store reg2
      Source |       SS           df       MS      Number of obs   =     1,388
-------------+----------------------------------   F(2, 1385)      =     27.47
       Model |  21925.2578         2  10962.6289   Prob > F        =    0.0000
    Residual |  552686.462     1,385  399.051597   R-squared       =    0.0382
-------------+----------------------------------   Adj R-squared   =    0.0368
       Total |   574611.72     1,387  414.283864   Root MSE        =    19.976

------------------------------------------------------------------------------
       bwght |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cigs |  -.5059517   .0898216    -5.63   0.000    -.6821527   -.3297507
       white |   6.148295   1.304469     4.71   0.000     3.589346    8.707244
       _cons |   114.9317   1.173547    97.94   0.000     112.6296    117.2339
------------------------------------------------------------------------------

Make a table with results from reg1 and reg2. Lots of opportunities to make the regression table nicer!

esttab reg1 reg2, title("Joint regression table:")
Joint regression table:
--------------------------------------------
                      (1)             (2)   
                    bwght           bwght   
--------------------------------------------
cigs               -0.514***       -0.506***
                  (-5.68)         (-5.63)   

white                               6.148***
                                   (4.71)   

_cons               119.8***        114.9***
                 (209.27)         (97.94)   
--------------------------------------------
N                    1388            1388   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Make the same table and save it in the file called regtable1. This file can later be used in a word document (rtf format is directly readable by word). Note: You can also export to .tex format using the same command but replacing .rtf with .tex.

Use “help esttab” or google if you want to customize your regression results. For example, you can use “estadd scalar” to add a number to your table.

2. Produce and export figures

graph twoway scatter LP_agric gdpcapita, name(LP, replace)
graph export "$figures/graph_1.png", replace
graph twoway scatter share_empl_agric gdpcapita, name(empshare, replace)
graph export "$figures/graph_2.png", replace
graph combine LP empshare, cols(2)
graph export "$figures/scatterGDP.png", replace